Classifying Steam’s Successful Games
Michael Yang, Edward Yang, Sanjae Chin
Topic and Motivation (What Is Steam?)
- Largest digital distribution platform and storefront
- Thousands of games are listed on Steam but only a fraction are successful
- Key player in the video game industry
- Our Research Topic:
- Aim to find important features that classify success
Our Data
Where we got our data from:
SteamDB
Official Steam Site
SteamCharts
Scraped on February 28, 2024
Our key variables:
Genres
Publisher/Developer
Base Price
Success
Number of Positive Reviews
Number of Negative Reviews
Total Number of Reviews
Peak Player Count
Peak Daily Player Count
EDA Visualizations
About 18% (598) of games in our cleaned dataset are considered successful
About 70% (421) of successful games cost less than $20
-
About 21% (128) were free
The average positive/negative review ratio is around 95%

EDA Genres
Most successful games are Indie (64%)
A decent number of games have Action-Adventure aspects (41%)
Indie Action-Adventure games are more likely to be successful compared to other genres

Methodology
- Logistic Regression
- simplicity, interpretability, linearity
- Support Vector Machine (SVM)
- effectiveness in handling high-dimensional data and complex decision boundaries
- Random Forest
- Ability to handle non-linear relationships
- Gradient Boosting Machine (GBM)
- Iterative improvement of model performance
Results
We then use ROC_AUC score to compare model performance of each model on the test dataset, which provides insights into the performance of each model in classifying success
- ROC_AUC for SVM is 0.88
- ROC_AUC for LR is 0.92
- ROC_AUC for RF is 1.0
- ROC_AUC for GBM is 0.5
Conclusions and Discussion
Currently our model is generally doing “well” to classify success based on our numerical columns(“BASE PRICE”, “ALL TIME PEAK”, “TOTAL REVIEWS”,“NEGATIVE REVIEWS”). However once we added genre columns to our model auc scores were always 1.0
We think this possibly due to two reasons:
Our change in what is success is now too simply
There is some 1-1 mapping of our genre columns to outcome (success)
Going forward we want to find where this could be originating from.
Depending on what we uncover
We may try to change our definition of success
if the origin seems interesting we may analyze it further